Using an MST based Value for ε in DBSCAN Algorithm for Obtaining Better Result
نویسنده
چکیده
In this paper, an objective function based on minimal spanning tree (MST) of data points is proposed for clustering and a density-based clustering technique has been used in an attempt to optimize the specified objective function in order to detect the ―natural grouping‖ present in a given data set. A threshold based on MST of data points of each cluster thus found is used to remove noise (if any present in the data) from the final clustering. A comparison of the experimental results obtained by DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm and the proposed algorithm has also been incorporated. It is observed that our proposed algorithm performs better than DBSCAN algorithm. Several experiments on synthetic data set in and show the utility of the proposed method. The proposed method has also found to provide good results for two real life data sets considered for experimentation. Note that -means is one of the most popular methods adopted to solve the clustering problem. This algorithm uses an objective function that is based on minimization of squared error criteria. Note that it may not always provide the ―natural grouping‖ though it is useful in many applications.
منابع مشابه
Improvement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملبررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائهشده برای آن
Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...
متن کاملAn Improved Initialization Method For Fuzzy C-Means Clustering Using Density Based Approach For Microarray Data
An improved initialization method for fuzzy cmeans (FCM) method is proposed which aims at solving the two important issues of clustering performance affected by initial cluster centers and number of clusters. A density based approach is needed to identify the closeness of the data points and to extract cluster center. DBSCAN approach defines ε–neighborhood of a point to determine the core objec...
متن کاملFaster DBScan and HDBScan in Low-Dimensional Euclidean Spaces
We present a new algorithm for the widely used density-based clustering method dbscan. Our algorithm computes the dbscan-clustering in O(n log n) time in R, irrespective of the scale parameter ε (and assuming the second parameter MinPts is set to a fixed constant, as is the case in practice). Experiments show that the new algorithm is not only fast in theory, but that a slightly simplified vers...
متن کاملAn efficient and scalable density-based clustering algorithm for datasets with complex structures
As a research branch of data mining, clustering, as an unsupervised learning scheme, focuses on assigning objects in the dataset into several groups, called clusters, without any prior knowledge. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most widely used clustering algorithms for spatial datasets, which can detect any shapes of clusters and can automatic...
متن کامل